AITopics | approximate nearest neighbor

Collaborating Authors

approximate nearest neighbor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An algorithm for L1 nearest neighbor search via monotonic embedding

Xinan Wang, Sanjoy Dasgupta

Neural Information Processing SystemsMar-23-2026, 08:16:12 GMT

Fast algorithms for nearest neighbor (NN) search have in large part focused on 2 distance. Here we develop an approach for 1 distance that begins with an explicit and exactly distance-preserving embedding of the points into 22. We show how this can efficiently be combined with random-projection based methods for 2 NN search, such as locality-sensitive hashing (LSH) or random projection trees. We rigorously establish the correctness of the methodology and show by experimentation using LSH that it is competitive in practice with available alternatives.

information retrieval, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.42)

Add feedback

Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation

Danait, Ved, Das, Srijan, Bhore, Sujoy

arXiv.org Machine LearningOct-28-2025

Approximate Nearest Neighbor (ANN) search and Approximate Kernel Density Estimation (A-KDE) are fundamental problems at the core of modern machine learning, with broad applications in data analysis, information systems, and large-scale decision making. In massive and dynamic data streams, a central challenge is to design compact sketches that preserve essential structural properties of the data while enabling efficient queries. In this work, we develop new sketching algorithms that achieve sublinear space and query time guarantees for both ANN and A-KDE for a dynamic stream of data. For ANN in the streaming model, under natural assumptions, we design a sublinear sketch that requires only $\mathcal{O}(n^{1+ρ-η})$ memory by storing only a sublinear ($n^{-η}$) fraction of the total inputs, where $ρ$ is a parameter of the LSH family, and $0<η<1$. Our method supports sublinear query time, batch queries, and extends to the more general Turnstile model. While earlier works have focused on Exact NN, this is the first result on ANN that achieves near-optimal trade-offs between memory size and approximation error. Next, for A-KDE in the Sliding-Window model, we propose a sketch of size $\mathcal{O}\left(RW \cdot \frac{1}{\sqrt{1+ε} - 1} \log^2 N\right)$, where $R$ is the number of sketch rows, $W$ is the LSH range, $N$ is the window size, and $ε$ is the approximation error. This, to the best of our knowledge, is the first theoretical sublinear sketch guarantee for A-KDE in the Sliding-Window model. We complement our theoretical results with experiments on various real-world datasets, which show that the proposed sketches are lightweight and achieve consistently low error in practice.

artificial intelligence, machine learning, probability, (15 more...)

arXiv.org Machine Learning

2510.23039

Country:

Asia > India > Maharashtra > Mumbai (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.62)

Add feedback

Chameleon2++: An Efficient Chameleon2 Clustering with Approximate Nearest Neighbors

Singh, Priyanshu, Ahuja, Kapil

arXiv.org Artificial IntelligenceJan-5-2025

Clustering algorithms are fundamental tools in data analysis, with hierarchical methods being particularly valuable for their flexibility. Chameleon is a widely used hierarchical clustering algorithm that excels at identifying high-quality clusters of arbitrary shapes, sizes, and densities. Chameleon2 is the most recent variant that has demonstrated significant improvements, but suffers from critical failings and there are certain improvements that can be made. The first failure we address is that the complexity of Chameleon2 is claimed to be $O(n^2)$, while we demonstrate that it is actually $O(n^2\log{n})$, with $n$ being the number of data points. Furthermore, we suggest improvements to Chameleon2 that ensure that the complexity remains $O(n^2)$ with minimal to no loss of performance. The second failing of Chameleon2 is that it lacks transparency and it does not provide the fine-tuned algorithm parameters used to obtain the claimed results. We meticulously provide all such parameter values to enhance replicability. The improvement which we make in Chameleon2 is that we replace the exact $k$-NN search with an approximate $k$-NN search. This further reduces the algorithmic complexity down to $O(n\log{n})$ without any performance loss. Here, we primarily configure three approximate nearest neighbor search algorithms (Annoy, FLANN and NMSLIB) to align with the overarching Chameleon2 clustering framework. Experimental evaluations on standard benchmark datasets demonstrate that the proposed Chameleon2++ algorithm is more efficient, robust, and computationally optimal.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.02612

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Towards Real-Time 2D Mapping: Harnessing Drones, AI, and Computer Vision for Advanced Insights

Agnur, Bharath Kumar

arXiv.org Artificial IntelligenceDec-31-2024

This paper presents an advanced mapping system that combines drone imagery with machine learning and computer vision to overcome challenges in speed, accuracy, and adaptability across diverse terrains. By automating processes like feature detection, image matching, and stitching, the system produces seamless, high-resolution maps with minimal latency, offering strategic advantages in defense operations. Developed in Python, the system utilizes OpenCV for image processing, NumPy for efficient computations, and Concurrent[dot]futures for parallel execution. ORB (Oriented FAST and Rotated BRIEF) is employed for feature detection, while FLANN (Fast Library for Approximate Nearest Neighbors) ensures accurate keypoint matching. Homography transformations align overlapping images, resulting in distortion-free maps in real time. This automation eliminates manual intervention, enabling live updates essential in rapidly changing environments. Designed for versatility, the system performs reliably under various lighting conditions and rugged terrains, making it highly suitable for aerospace and defense applications. Testing has shown notable improvements in processing speed and accuracy compared to conventional methods, enhancing situational awareness and informed decision-making. This scalable solution leverages cutting-edge technologies to provide actionable, reliable data for mission-critical operations.

computer vision, feature detection, matching, (14 more...)

arXiv.org Artificial Intelligence

2412.2021

Country: Asia > India > Telangana > Hyderabad (0.04)

Genre:

Research Report (0.50)
Overview > Innovation (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
(2 more...)

Add feedback

Nearest Neighbor based Greedy Coordinate Descent

Neural Information Processing SystemsMar-14-2024, 22:43:40 GMT

Modem statistical estimators developed over the past decade have statistical or sample complexity that depends only weakly on the number of parameters when there is some structore to the problem, such as sparsity. A central question is whether similar advances can be made in their computational complexity as well. In this paper, we propose strategies that indicate that such advances can indeed be made. In particular, we investigate the greedy coordinate descent algorithm, and note that performing the greedy step efficiently weakens the costly dependence on the problem size provided the solution is sparse. We then propose a snite of methods that perform these greedy steps efficiently by a reduction to nearest neighbor search. We also devise a more amenable form of greedy descent for composite non-smooth objectives; as well as several approximate variants of such greedy descent. We develop a practical implementation of our algorithm that combines greedy coordinate descent with locality sensitive hashing. Without tuning the latter data structore, we are not only able to significantly speed up the vanilla greedy method, hot also outperform cyclic descent when the problem size becomes large. Our resnlts indicate the effectiveness of our nearest neighbor strategies, and also point to many open questions regarding the development of computational geometric techniques tailored towards first-order optimization methods.

algorithm, coordinate descent, descent, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

An algorithm for l nearest neighbor search via monotonic embedding Xinan Wang Fast algorithms for nearest neighbor (NN) search have in large part focused on l

Neural Information Processing SystemsMar-12-2024, 10:46:22 GMT

We rigorously establish the correctness of the methodology and show by experimentation using LSH that it is competitive in practice with available alternatives.

machine learning, natural language, projection, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.05)
Asia > Afghanistan > Parwan Province > Charikar (0.05)
North America > United States > New York (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.81)

Add feedback

[P] Entity Embed: fuzzy and scalable Entity Resolution using Approximate Nearest Neighbors

#artificialintelligenceApr-27-2021, 00:15:38 GMT

Entity Embed is based on and is a special case of the AutoBlock model described by Amazon. It allows you to transform entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors. Using Entity Embed, you can train a deep learning model to transform records into vectors in an N-dimensional embedding space. Thanks to a contrastive loss, those vectors are organized to keep similar records close and dissimilar records far apart in this embedding space. Embedding records enables scalable ANN search, which means finding thousands of candidate duplicate pairs of records per second per CPU.

approximate nearest neighbor, entity embed, entity resolution, (2 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)
Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Hot papers on arXiv from the past month – July 2020

AIHubAug-3-2020, 10:15:02 GMT

Here are the most tweeted papers that were uploaded onto arXiv during July 2020. Results are powered by Arxiv Sanity Preserver. Abstract: Massive language models are the core of modern NLP modeling and have been shown to encode impressive amounts of commonsense and factual information. However, that knowledge exists only within the latent parameters of the model, inaccessible to inspection and interpretation, and even worse, factual information memorized from the training corpora is likely to become stale as the world changes. Knowledge stored as parameters will also inevitably exhibit all of the biases inherent in the source materials.

artificial intelligence, inductive learning, machine learning, (18 more...)

AIHub

Genre: Research Report (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift Clustering

Beck, Gaël, Duong, Tarn, Lebbah, Mustapha, Azzag, Hanane, Cérin, Christophe

arXiv.org Machine LearningFeb-11-2019

In this paper we target the class of modal clustering methods where clusters are defined in terms of the local modes of the probability density function which generates the data. The most well-known modal clustering method is the k-means clustering. Mean Shift clustering is a generalization of the k-means clustering which computes arbitrarily shaped clusters as defined as the basins of attraction to the local modes created by the density gradient ascent paths. Despite its potential, the Mean Shift approach is a computationally expensive method for unsupervised learning. Thus, we introduce two contributions aiming to provide clustering algorithms with a linear time complexity, as opposed to the quadratic time complexity for the exact Mean Shift clustering. Firstly we propose a scalable procedure to approximate the density gradient ascent. Second, our proposed scalable cluster labeling technique is presented. Both propositions are based on Locality Sensitive Hashing (LSH) to approximate nearest neighbors. These two techniques may be used for moderate sized datasets. Furthermore, we show that using our proposed approximations of the density gradient ascent as a pre-processing step in other clustering methods can also improve dedicated classification metrics. For the latter, a distributed implementation, written for the Spark/Scala ecosystem is proposed. For all these considered clustering methods, we present experimental results illustrating their labeling accuracy and their potential to solve concrete problems.

dataset, nearest neighbor, nnga, (15 more...)

arXiv.org Machine Learning

1902.03833

Country:

North America > United States > New York (0.04)
North America > United States > Tennessee > Knox County > Knoxville (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

New benchmarks for approximate nearest neighbors

#artificialintelligenceFeb-25-2018, 03:03:10 GMT

One of my super nerdy interests include approximate algorithms for nearest neighbors in high-dimensional spaces. You have say 1M points in some high-dimensional space. Now given a query point, can you find the nearest points out of the 1M set? Doing this fast turns out to be tricky. I'm the author of Annoy which has more than 3,000 stars on Github.

artificial intelligence, benchmark, nearest neighbor, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.69)

Add feedback